A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets†
نویسندگان
چکیده
The integration of high-throughput genomic data represents an opportunity for deciphering the interplay between structural and functional organization of genomes and for discovering novel biomarkers. However, the development of integrative approaches to complement gene expression (GE) data with other types of gene information, such as copy number (CN) and chromosomal localization, still represents a computational challenge in the genomic arena. This work presents a computational procedure that directly integrates CN and GE profiles at genome-wide level. When applied to DNA/RNA paired data, this approach leads to the identification of Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions (SODEGIR). This goal is accomplished in three steps. The first step extends to CN a method for detecting regional imbalances in GE. The second part provides the integration of CN and GE data and identifies chromosomal regions with concordantly altered genomic and transcriptional status in a tumor sample. The last step elevates the single-sample analysis to an entire dataset of tumor specimens. When applied to study chromosomal aberrations in a collection of astrocytoma and renal carcinoma samples, the procedure proved to be effective in identifying discrete chromosomal regions of coordinated CN alterations and changes in transcriptional levels.
منابع مشابه
Imbalanced Class Learning in Epigenetics
In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological ...
متن کاملThe Application of a Non-Radioactive DD-AFLP Method for Profiling of Aeluropus lagopoides Differentially Expressed Transcripts under Salinity or Drought Conditions
Aeluropus lagopoides is a salt and drought tolerant grass from Poaceae family, distributed widely in arid regions. There is almost no information about the genetics or genome of this close relative of wheat that stands harsh conditions of deserts. Differential Display Amplified fragment length polymorphism (DD-AFLP) led to the improvement of a non-radioactive method for which many parameters we...
متن کاملIntegration of 198 ChIP-seq Datasets Reveals Human cis-Regulatory Regions
We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Althoug...
متن کاملIntegrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes
Progress in understanding complex genetic diseases has been bolstered by synthetic approaches that overlay diverse data types and analyses to identify functionally important genes. Pre-term birth (PTB), a major complication of pregnancy, is a leading cause of infant mortality worldwide. A major obstacle in addressing PTB is that the mechanisms controlling parturition and birth timing remain poo...
متن کاملتخمین مکان نواحی کدکننده پروتئین در توالی عددی DNA با استفاده پنجره با طول متغیر بر مبنای منحنی سه بعدی Z
In recent years, estimation of protein-coding regions in numerical deoxyribonucleic acid (DNA) sequences using signal processing tools has been a challenging issue in bioinformatics, owing to their 3-base periodicity. Several digital signal processing (DSP) tools have been applied in order to Identify the task and concentrated on assigning numerical values to the symbolic DNA sequence, then app...
متن کامل